Statistical Evaluation of the Doughnut Clustering Method for Product Affinity Segmentation
نویسندگان
چکیده
Product affinity segmentation is a powerful technique for marketers and sales professionals to gain a good understanding of customers’ needs, preferences, and purchase behavior. Performing product affinity segmentation is quite challenging in practice because product level data usually have high skewness, high kurtosis, and large percentage of zero values. The Doughnut clustering method has been shown to be effective using real data, and was presented at SAS Global Forum 2013 (Baer & Chakraborty, 2013). However, the Doughnut clustering method is not a panacea for addressing the product affinity segmentation problem. There is a clear need for a comprehensive evaluation of this method in order to be able to develop generic guidelines for practitioners on when to apply the method. In this paper, we meet the need by evaluating the Doughnut clustering method on simulated data with different levels of skewness, kurtosis, and percentage of zero values. We developed a five-step approach based on Fleishman’s power method to generate synthetic data with prescribed parameters. Subsequently, we designed and conducted a set of experiments to apply the Doughnut clustering method as well as the traditional K-means method as benchmark on the simulated data. We draw conclusions on the performance of the Doughnut clustering method by comparing the clustering validity metric “the ratio of between-cluster variance to within-cluster variance” as well as the relative proportion of cluster sizes against those of K-means. In certain data situations, the Doughnut clustering method is shown to produce an acceptable clustering solution when other approaches fail.
منابع مشابه
Cluster-Based Image Segmentation Using Fuzzy Markov Random Field
Image segmentation is an important task in image processing and computer vision which attract many researchers attention. There are a couple of information sets pixels in an image: statistical and structural information which refer to the feature value of pixel data and local correlation of pixel data, respectively. Markov random field (MRF) is a tool for modeling statistical and structural inf...
متن کاملImage Segmentation Based on Fast Normalized Cut
In this paper, we propose a fast image segmentation method based on normalized cut. This method apply simple linear iterative clustering super-pixel algorithm to obtain super-pixel regions, and then use affinity propagation clustering to extract the representative pixels in each super-pixel regions, Finally, we apply normalized cut to obtain segmentation results. At the end of the paper, Numeri...
متن کاملImage Segmentation: Type–2 Fuzzy Possibilistic C-Mean Clustering Approach
Image segmentation is an essential issue in image description and classification. Currently, in many real applications, segmentation is still mainly manual or strongly supervised by a human expert, which makes it irreproducible and deteriorating. Moreover, there are many uncertainties and vagueness in images, which crisp clustering and even Type-1 fuzzy clustering could not handle. Hence, Type-...
متن کاملImage Segmentation using SLIC Superpixels and Affinity Propagation Clustering
In this paper, we propose a new method of image segmentation, named SLICAP, which combines the simple linear iterative clustering (SLIC) method with the affinity propagation (AP) clustering algorithm. First, the SLICAP technique uses the SLIC superpixel algorithm to form an over-segmentation of an image. Then, a similarity is constructed based on the features of superpixels. Finally, the AP alg...
متن کاملDetection of lung cancer using CT images based on novel PSO clustering
Lung cancer is one of the most dangerous diseases that cause a large number of deaths. Early detection and analysis can be very helpful for successful treatment. Image segmentation plays a key role in the early detection and diagnosis of lung cancer. K-means algorithm and classic PSO clustering are the most common methods for segmentation that have poor outputs. In t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015